Cancer Discovery
● American Association for Cancer Research (AACR)
Preprints posted in the last 7 days, ranked by how well they match Cancer Discovery's content profile, based on 61 papers previously published here. The average preprint has a 0.08% match score for this journal, so anything above that is already an above-average fit.
Jacobs, L. A.
Show abstract
COVID-19 risk scores developed during the pandemic relied on measurements contemporaneous with infection, leaving unresolved whether the metabolic and inflammatory vulnerability they capture pre-existed as a stable trait or was triggered by acute illness. Here, using 501,946 UK Biobank participants whose blood was drawn between 2006 and 2010---at least ten years before SARS-CoV-2 emerged---we show that baseline proteomic and metabolic profiles predict both COVID-19 hospitalization (2,783 events; C-statistic =0.676 [0.666--0.686]) and COVID-19 mortality (1,564 deaths; C-statistic =0.730 [0.701--0.760]) from parsimonious, regularized feature sets. The IL-1 pathway index (xIL1, +0.093) was independently selected for hospitalization but not mortality, while the IL-6 trans-signaling index (xIL6, + 0.040) was selected for mortality but not hospitalization---a differential pathway weighting corroborated by independent LightGBM/SHAP analysis and mirroring the subsequent success of tocilizumab (anti-IL-6R) and the limited efficacy of anakinra (anti-IL-1R) in reducing COVID-19 mortality in randomized trials conducted years later. The mortality model was additionally characterized by central adiposity (waist-hip ratio, +0.386), a respiratory compromise index (xRSP, +0.149), and prodromal cardiovascular disease (pCVD, +0.246). These findings establish that vulnerability to a novel pathogen is, in substantial part, a pre-existing and measurable prodromal state, with implications for pandemic preparedness and population-level risk stratification.
Wang, S.; Mapar, P.; Moldovan, N.; van der Pol, Y.; Safrastyan, A.; van Werkhoven, E.; Tantyo, N. A.; Snieder, B.; Do Brito Valente, A. F.; de Jong, A. V.; Dinmohamed, A.; Drees, E. E. E.; Roemer, M. G. M.; Ylstra, B.; Klerk, C. P. W.; Strobbe, L.; Sandberg, Y.; Boersma, R. S.; Koene, H.; Pruijt, H.; de Heer, K.; van Rijn, R.; Bilgin, Y. M.; de Jongh, E.; Nijland, M.; van der Poel, M.; Koster, A.; Nieuwenhuizen, L.; Fijnheer, R.; Beeker, A.; Mous, R.; Vergote, V. K. J.; Vermaat, J. S. P.; Pegtel, D. M.; Chamuleau, M. E. D.; Mouliere, F.
Show abstract
Curative-intent immunochemotherapy fails in ~30% of patients with large B-cell lymphoma (LBCL), yet no validated molecular tool enables early identification of high-risk individuals to guide treatment intensification. Using shallow whole genome sequencing (sWGS) of plasma cell-free DNA from 190 LBCL patients, we developed and validated the ACT score (Aberrations, fragment Composition, Terminal motifs), a composite classifier integrating genomic and fragmentomic features from a single post-cycle-1 sample. ACT-positive patients had worse 2-year outcomes versus ACT-negative patients: time-to-progression 29% vs. 83% (HR 4.4, 95% CI 1.9 - 10.0; P = 1.5 x 10 - 4) and overall survival 47% vs. 93% (HR 8.7, 95% CI 3.0 - 25.4; P = 1.8 x 10-6). ACT score was independently prognostic of the International Prognostic Index, and their combination identified the highest-risk patients. Unlike mutation-based approaches, this assay requires neither tumor tissue, germline control nor a baseline plasma sample. Built on open-source tools and sWGS, the ACT score offers a feasible scalable strategy for early risk stratification in aggressive LBCL.
Ofordile, O. N.
Show abstract
Using a longitudinal cohort of 633 Gambian children (IHAT-GUT, NCT02941081), we resolve two mechanistically distinct ecological pathways linking Prevotella stercorea to infection risk. Its abundance positively predicts gut microbiome richness, consistent with community-level colonisation resistance for enteric outcomes. However, its association with reduced acute respiratory infection (ARI) persists unchanged after richness adjustment, identifying a species-autonomous pathway independent of community diversity. Weight-for-age z-score (WAZ) is uncorrelated with microbiome richness within strata, supporting WAZ as a proxy for host immune-metabolic reserve rather than a determinant of microbiome composition. In Low-WAZ children, P. stercorea at Day 1 associates with suppressed CRP, whereas in higher-WAZ children, elevated Day 1 inflammation predicts subsequent P. stercorea colonisation at Day 85, consistent with host-context-dependent immune selection. ARI and fever protection is richness-independent and concentrated in Low-WAZ children. P. copri does not retain an independent protective association when modelled jointly. These findings have direct implications for microbiome-directed interventions.
Sharma, R.; Hu, F.; Li, X.; Campos, R.; Kundu, K.; Atanur, S.; Karpinski, M.; Wasilewski, S.; MacArthur, S.; Vitsios, D.; Dhindsa, R. S.; Georgakopoulos-Soares, I.; Burren, O. S.; Petrovski, S.; Mustoe, A. M.; Wang, Q.; Glodzik, D.; Zou, X. Z.
Show abstract
Non-coding variants are important contributors to human traits and diseases but linking them to molecular mechanisms and phenotypes at scale remains challenging. G-quadruplexes (G4s) are four-stranded structures formed by guanine-rich sequences and have emerged as key functional elements within the non-coding genome. G4s are enriched in regulatory regions and can modulate gene expression at both the DNA and RNA levels, influencing transcription, replication, and RNA processing, positioning them as key mediators linking non-coding variation to complex biological traits. Here, we profile putative G4s across five regulatory regions in 459,449 UK Biobank genomes and perform phenome-wide association analyses spanning 2,941 plasma protein abundances, 13,321 binary traits, and 1,682 quantitative traits. We show that putative G4-modifying variants are depleted under purifying selection despite elevated local mutability and drive large, bidirectional associations with plasma proteins and clinical traits, including associations not captured by coding variants. Using a mechanism-aware collapsing strategy that groups rare non-coding variants by their predicted impact on G4 stability, we achieved stronger gene-level signals than those obtained with standard rare-variant collapsing approaches. Integrating non-coding and protein-truncating variants (PTVs) increases discovery power, revealing 843 significant associations missed by the PTV-only model. Replication in the Alliance for Genomic Discovery cohort demonstrates cross-cohort robustness. Our study suggests G4s as widespread mediators of non-coding regulation and provides a framework for mechanism-informed target discovery and prioritization across the non-coding genome.
Cifello, J.; Feng, R.; Grenn, F. P.; Carter, L.; Liu, A.; Sun, H.; Li, R.; Empawi, J. A.; Greenfest-Allen, E.; Katanic, Z.; Valladares, O.; Kuzma, A. B.; White, H.; Farrer, L. A.; Goate, A. M.; Raj, T.; Wang, M.; Cruchaga, C.; Wang, L.-S.; Klein, H.; De Jager, P. L.; Chen, H.; Marcora, E.; TCW, J.; Zhang, X.; Kuksa, P. P.; Wang, G.; Leung, Y. Y.
Show abstract
Understanding the regulatory consequences of genetic variation in the aging human brain requires molecular maps that span brain regions, cell types and regulatory modalities. We present the Alzheimer's Disease Sequencing Project Functional Genomics (FunGen-AD) xQTL Atlas, a harmonized resource of molecular quantitative trait loci from four postmortem brain studies, ROSMAP, MSBB, Knight-ADRC and MiGA. The atlas integrates histone acetylation, DNA methylation, gene expression, splicing and protein abundance QTLs across 14 brain regions, 7 major cell types and 17,566 samples, with standardized association, significance-filtered and fine-mapping outputs. To expand discovery beyond conventional 1-Mb cis windows, we include variants within Topologically Associating Domains (TAD) and their boundaries where appropriate, identifying on average 21% more variant-molecular-trait associations per dataset. Statistical fine-mapping reduced broad association sets by 95% into credible sets of candidate regulatory variants. Distributed through the NIAGADS xQTL portal and bulk-download services, the atlas provides a comprehensive functional-genomic foundation for interpreting genetic risk variants in Alzheimer's disease and aging-brain research.
Su, C.-Y.; Butler-Laporte, G.
Show abstract
Yang et al. recently published a systematic comparison of genetic effects on disease susceptibility and disease-specific mortality across nine common diseases and seven biobanks, concluding that susceptibility and survival architectures overlap only modestly. This is an important resource, but we argue that the current mortality genome-wide association studies (GWAS) require explicit power calibration before limited overlap can be interpreted biologically. Using two-sample Mendelian randomization (MR) with positive-control exposures, we show that even a well-powered positive control, body mass index (BMI), instrumented by 855 genome-wide-significant variants, produces a clearly detectable effect for heart failure (HF) mortality, with only weaker evidence for chronic kidney disease (CKD) mortality. However, when BMI instruments were stratified into quartiles by exposure-association strength, the heart failure association remained nominally significant only in the two strongest quartiles and was not significant in the two weakest quartiles. Further, using household income as a weakly instrumented socio-economic contrast has insufficient power to detect moderate effects on any disease mortality outcome. These analyses indicate that current disease mortality GWAS may be insufficiently powered to detect shared effects. In contrast, the same BMI instrument set produced large and directionally coherent effects when applied to case-control GWAS of the matched six diseases, with the HF and prostate cancer associations preserved under a within-family BMI sensitivity analysis, and nominal support for CKD. The HF mortality association was also preserved in a within-family BMI sensitivity analysis. Similarly, genetically proxied household income was associated with HF risk in the case-control GWAS despite null associations with disease-specific mortality, consistent with limited power in the mortality GWAS. These findings indicate that the limited BMI-mortality evidence across several outcomes is unlikely to reflect a weak BMI instrument or dynastic artefacts alone and instead supports limited effective power in current disease-mortality GWAS.
Mao, Y.; Lopman, B.; Koelle, K.; Lau, M. S.
Show abstract
Accurate forecasting of seasonal influenza is critical for public health preparedness, and data-driven models are central to this effort. However, most approaches rely on aggregate indicators of influenza-like-illness (ILI), which can obscure heterogeneity and limit predictability at longer horizons. While subtype dynamics are well established, their role in data-driven forecasting remains incompletely understood. Here, we integrate subtype-resolved surveillance data into diverse data-driven frameworks using over a decade of U.S. surveillance records to evaluate and decompose predictive signal in influenza forecasting. Across pre- and post-COVID-19 periods, subtype-informed models consistently improve over baseline models trained on aggregate ILI alone, with the largest gains at longer horizons. Decomposition reveals a horizon-dependent reorganization of predictability: autoregressive persistence in recent aggregate incidence dominates at short horizons but declines with lead time, while predictive signal shifts toward subtype-derived structure. Within this structure, interaction-related features among co-circulating subtypes grow systematically with forecast horizon, indicating that longer-term predictability is driven increasingly by interaction structure rather than marginal subtype composition alone. Together, our results show that subtype information provides non-redundant predictive signal and extends the effective forecasting window of data-driven models. More broadly, our findings suggest that aggregation of heterogeneous subtype processes can obscure latent predictability, supporting subtype-resolved surveillance.
Berna, A.; Fahrmann, J.; Irajizad, E.; Rudsari, H.; Liu, Y.; Logan, J.; Murtada, K.; Grandy, J.; Edwards, M.; Ayers, A.; Ahmed, S.; Neelapu, S.; Saini, N.; John, A.; John, T.
Show abstract
Background: Severe cytokine release syndrome (CRS) and immune effector cell-associated neurotoxicity syndrome (ICANS) are major dose-limiting toxicities of chimeric antigen receptor (CAR) T-cell therapy. Existing pre-infusion biomarkers offer modest discrimination, motivating non-invasive alternatives. Methods: We prospectively enrolled 26 patients with relapsed/refractory large B-cell lymphoma receiving axicabtagene ciloleucel. Pre-infusion (day -1) exhaled breath samples were analyzed by gas chromatography-mass spectrometry for 40 volatile organic compounds (VOCs). Candidates with univariate AUC > 0.65 for severe (grade >=2) CRS or ICANS were carried forward to sensitivity-maximization-at-given-specificity with LASSO regularization (SMAGS-LASSO), which selected separate panels for each outcome. Model performance was assessed by leave-one-out cross-validation with permutation p-values and Harrell bootstrap optimism correction. Results: The 4-VOC CRS panel (heptanal, benzaldehyde, 2-butanone, ethylbenzene) achieved LOOCV AUC 82.5% (80% sensitivity at 88% specificity) and the 3-VOC ICANS panel (nonanal, allyl methyl sulfide, levomenthol) achieved AUC 86.3% (67% sensitivity at 86% specificity). By tertile, severe CRS occurred in 8/9 (89%) high-risk versus 2/9 (22%) low-risk patients (Cox HR 6.82, 95% CI 1.41-32.9, p=0.017) and severe ICANS occurred in 8/9 (89%) versus 2/9 (22%) (HR 8.28, 95% CI 1.73-39.6, p=0.008). Each 1-SD score increase corresponded to a 3.80-fold higher hazard of severe CRS (p<0.001) and 4.36-fold higher hazard of severe ICANS (p<0.001). In head-to-head comparison, the 3-VOC ICANS panel outperformed the modified Endothelial Activation and Stress Index (mEASIX) (delta-AUC +0.36, DeLong 1-sided p=0.008). The 4-VOC CRS panel had numerically higher AUC than mEASIX (delta-AUC +0.19, p=0.150). Conclusions: Pre-infusion exhaled breath VOC panels stratify CAR T-cell recipients by severity and timing of severe CRS and ICANS, providing a non-invasive complement to existing serum biomarkers. Multi-institutional validation is warranted.
Rodriguez, X.; Perez-Jimenez, J. G.; Alexander, L. W.; Lezcano-Coba, C.; Galue, J.; Juarez, Y.; Beltran, D.; Smith, D. R.; Kadir, M.; Ali, D. W.; Corrales, R.; Trujillo Rodriguez, L.; Valdiviezo, G. E.; Thomas, Q. K.; Cicalo, A.; Fitzpatrick, M. C.; Luquette, A. E.; Cameron Sayer, L.; Cer, R. Z.; Malagon, F.; Grajales, I. A.; Rivera, L. F.; Gonzalez-R, Z.; Antioco, J.; Walters-Valdes, E.; Meneghello-Ponce, N.; Vittor, A. Y.; Escobar-Lee, K.; Abouganem-Shaw, A.; Rodriguez, F.; Aguirre, E.; Loyola, S.; Tinoco, Y.; Moreno, B.; Chen-German, M.; Ampuero, S.; Gomez-Angelo, A.; Correa-Duarte, S.; Ace
Show abstract
Oropouche virus (OROV) spread across the Americas in 2024, yet Panama Darien migration corridor saw no outbreak until nearly a year after Brazil January 2024 peak, raising two hypotheses: cryptic circulation masked by diagnostic gaps, or recent introduction under permissive climatic conditions. Here we resolve this paradox using integrated clinical, genomic, and climate-informed surveillance. Among 1,040 individuals tested, 43% were OROV-positive and showed a clinical signature distinct from co-circulating arboviruses, including headache more frequent than in dengue (RR 2.38, 95% CI 1.74-3.24). The household secondary attack rate was 56%, and waste burning independently predicted infection. Phylogeographic reconstruction identified a single recent introduction in October 2024 with no evidence of adaptive evolution, excluding prolonged cryptic persistence. Climate-informed models indicate broad outbreak susceptibility across Panama, with Bocas del Toro and Los Santos as the next highest-risk provinces. These findings identify a Central American foothold for OROV with potential for further northward spread.
Zhang, C.; Chen, Y.-L.; Jamilov, A.; Liu, E.; Shree, S.; Lam, B. D.; Foy, B. H.
Show abstract
Most routine clinical markers are interpreted using population-based reference intervals, despite being regulated around patient-specific homeostatic setpoints. This mismatch obscures physiologic shifts, inhibiting detection of early disease signatures. Here, we develop a novel Bayesian inference method that adaptively constructs personalized reference intervals using each patients existing health records. In analysis of >100 million lab tests in >800,000 patients, these personalized intervals can be accurately constructed with only minimal prior data, meaning this method can be applied near universally. We show that across 43 common lab markers, patient setpoints are strongly associated with future morbidity, with signal strength increasing as more test data is collected. Deviation from personalized reference intervals provides strong and novel risk signatures across diverse disease states, including hypothyroidism, hematologic cancers, kidney disease, and pregnancy complications. Importantly, personalized reference intervals capture a different risk signature to existing population-based approaches, with the highest risk patients being those who deviate from both intervals simultaneously. In a targeted clinical use case study of iron infusion, use of personalized reference intervals greatly improved prediction of treatment efficacy and allowed precise tracking of treatment responses. Our results illustrate how existing health records can be used to construct personalized benchmarks for nearly all common clinical tests, driving a new paradigm for precision laboratory medicine.
Garrett, M. E.; Nouraie, S. M.; Machado, R. F.; Gordeuk, V. R.; Gladwin, M. T.; NHLBI Trans-Omics for Precision Medicine Consortium, ; Telen, M. J.; Ashley-Koch, A. E.
Show abstract
In the United States, sickle cell disease (SCD) is a rare inherited hemoglobinopathy affecting about 100,000 individuals, mostly with African ancestry. SCD causes damage to multiple organ systems and SCD nephropathy (SCDN) is a common complication associated with early mortality. We previously performed a genome-wide association study (GWAS) for SCDN and identified a modest number of genome-wide significant loci. Here, we leveraged the ancestral composition of participants from two well-characterized adult SCD cohorts to boost statistical power and perform a local ancestry-aware GWAS for estimated glomerular filtration rate (eGFR), resulting in the identification of novel genome-wide significant loci within the African (AFR) and European (EUR) ancestral components of participants. Meta-analysis identified 12 significant genomic regions in the AFR tract, including PPIL6, ARHGAP24, RAB11A, and STEAP3, and 38 regions in the EUR tract, including UBLCP1, ADAMTS6, JAZF1, MYO7B, MYO1C, PDGFA, GPC5, LRP1B, KANK1, and TRPV5. The identified regions encompass genes affecting inflammation, extracellular matrix (ECM) integrity, iron metabolism, magnesium ion homeostasis, B cell apoptosis, tumor necrosis factor (TNF) production, and estrogen signaling. Many of these genes and pathways are important not only for renal function, but also for SCD biology, providing additional support for the hypothesis that SCDN pathophysiology is unique from other forms of kidney disease. This study represents the largest local ancestry-aware analysis of SCDN to date, furthers our understanding of the genetic risk factors underlying SCDN, and proposes new targets that could be useful for the early identification and treatment of kidney dysfunction in SCD patients.
Cavon, J.; Perez, C.; Quinn-Bohmann, N.; Magis, A. T.; Gibbons, S. M.
Show abstract
Emerging evidence links the gut microbiome to sleep quality, yet measuring sleep at scale remains challenging. Commercial wearables, such as Fitbit, capture objective sleep and activity data in naturalistic settings. We integrated Fitbit data from a large, deeply-phenotyped cohort with paired lifestyle and health questionnaires. Wearable-derived measures aligned well with self-reported sleep, activity, and happiness. We identified dozens of covariate-adjusted associations between Fitbit-derived sleep features, lifestyle factors, and multi-omic data. Among molecular feature sets, the gut microbiome showed the greatest number of associations with sleep quality: butyrate-producing genera were positively associated with sleep and amplified the benefits of physical activity. Oscillospira, in particular, was consistently associated with better sleep. In blood, insulin, omega-3, and cortisol correlated with poorer sleep, whereas lower alcohol intake and mineral supplements correlated with better sleep. These robust, covariate-adjusted findings advance mechanistic understanding of the gut-sleep axis and broader molecular and lifestyle determinants of sleep quality.
Mosquera, J. V.; Tang, I.; Murach, M.; Auguste, G.; Kodali, A.; Hart, P.; Shaw, D. M.; Li, M.; Turner, A. W.; Hodonsky, C. J.; Dworak, N. M.; de Oliveira, A. K.; Sol-Church, K.; Jhee, T.; van der Sijs, K. I. M.; Adkar, S. S.; Choi, R. B.; Vacante, F.; Wu, J. C.; Cheng, P.; Giannarelli, C.; Leeper, N. J.; Finn, A. V.; Bjorkegren, J. L. M.; Kovacic, J. C.; Yurdagul, A.; van der Laan, S. W.; Miller, C. L.
Show abstract
Advances in single-cell and spatial assays have revolutionized the scale and resolution of molecular tissue profiling. Here we present MetaPlaq, a multimodal atlas of human atherosclerotic arterial beds comprising over a million cells across single-cell transcriptomics, epigenomics and high-resolution spatial expression assays. We map granular cell states and disease-relevant transcriptional programs within the native tissue context of coronary arteries. Furthermore, we map cardiovascular GWAS signals to smooth muscle cells (SMCs) and endothelial cells (ECs) and uncover the cis-regulatory architecture governing their phenotypic transitions. Our comprehensive epigenomic reference allowed us to build cell-specific enhancer-gene link maps and multimodal gene regulatory networks (GRNs) underlying disease-relevant states such as osteogenic SMCs and ECs undergoing mesenchymal transition. We also integrate SMC and EC disease-associated gene sets with GRNs to nominate key transcription factors such as PRRX1, BNC2 and ELK3 regulating atherosclerosis-relevant transcriptional programs. Finally, we layer single-cell and spatial modalities to fine-map GWAS variants with improved cell and anatomical context. We highlight candidate cell-specific regulatory mechanisms at less characterized CAD loci, including FGD5 and MCF2L in ECs. Together, this atlas represents an important step towards fully interpreting genetic risk loci and informing new therapeutic strategies for cardiovascular disease.
Goodman, M. O.; Alex, R. M.; Sands, S. A.; Azarbarzin, A.; Batool-anwar, S.; Pavlova, M. K.; Epstein, L. J.; Redline, S.; Cade, B. E.
Show abstract
Obstructive sleep apnea (OSA) is associated with a wide range of comorbidities, but the extent to which these follow predictable, age-dependent patterns is not well understood. Identifying such patterns could provide insight into OSA heterogeneity and its links to physiological measures of OSA. We trained age-dependent topic models (ATM) on longitudinal electronic health records from 36,426 patients with OSA in the Mass General Brigham Biobank. ATM organizes incident diagnoses into distinct comorbidity "topics," whose age-specific disease loadings represent predictive patterns linking related diagnoses across the life course. We applied the trained model to compute individual-level topic scores in independent data: a cohort of 11,689 OSA cases and 22,695 matched controls, and a cohort of 6,220 patients with polysomnography (PSG)-derived physiological measures. We identified 19 distinct age-dependent comorbidity profiles, all significantly associated with OSA case status (FDR-adjusted p<0.05). Topics reflected recognizable clusters including metabolic, neuropsychiatric, and immune-mediated conditions, and several were distinguished by age-of-onset of key comorbidities, such as early- vs late-onset asthma. Seventeen of the 19 topics were significantly associated with at least one of 13 PSG-derived physiological measures, including associations between cardiometabolic topics and the apnea-hypopnea index, sleep apnea specific hypoxic burden, and respiratory event-specific heart rate burden. These findings indicate that age-dependent comorbidity patterns distinguish meaningful OSA subtypes with differing prognoses and endophenotype associations. ATM offers insight into complex OSA comorbidity and suggests that age-informed, topic-based stratification may improve individualized risk assessment, interpretation of PSG findings, and targeting of clinical interventions.
Casalino-Matsuda, S. M.; Guggilla, V.; Gao, C. A.; Demeulenaere, K. E.; Cusick, L. P.; Fenske, S. W.; Yu, Z.; Lu, Z.; Swaminathan, S.; Grant, R. A.; Schleck, M. J.; Prakriya, M.; Hebbar, S.; Stauderman, K.; Donnelly, H. K.; Pickens, C.; Morales-Nebreda, L.; The NU SCRIPT Study Investigators, ; Wunderink, R. G.; Misharin, A. V.; Singer, B. D.; Budinger, G. S.
Show abstract
Viral pneumonia is perpetuated by inflammatory circuits between activated T cells and monocyte-derived alveolar macrophages (MoAM). T cells and macrophages express ORAI1 and STIM1, which form calcium release-activated calcium (CRAC) channels that allow extracellular calcium entry in response to endoplasmic reticulum calcium store depletion. In a randomized, placebo-controlled, multicenter phase 2 trial (CARDEA), Auxora, a CRAC channel inhibitor, reduced all-cause 30-day mortality by 56% in patients with severe SARS-CoV-2 pneumonia. Here, we report a multi-omics analysis of serially collected alveolar samples from unvaccinated patients with severe SARS-CoV-2 pneumonia treated with Auxora versus placebo. We found reductions in plasma levels of the monocyte- and T cell-chemokines, CCL8 and PDGF-AA. Using peripheral blood mononuclear cells (PBMC) from healthy volunteers, we show that Auxora directly targets T cells to inhibit the transcription of CCL8 and PDGFA in monocyte-derived macrophages, supporting a mechanism for its effects and a potential intermediate biomarker of efficacy.
Zhang, K.; John, D.; Li, W. T.; Hogarth, M.; McKay, R. R.; Ongkeko, W. M.
Show abstract
Importance: While gut dysbiosis is known to impair response to immune checkpoint inhibitors (ICIs), the relative clinical impact of antibiotic timing (pre- vs. post-ICI initiation) remains unclear. Objective: To evaluate whether antibiotic timing differentially influences overall survival (OS) in a large, multi-institutional pan-cancer cohort. Design, Setting, and Participants: This retrospective cohort study utilized deidentified electronic health record data from six academic medical centers within the University of California Health system. We included 21,108 adults with any malignancy who received PD-1, PD-L1, or CTLA-4 inhibitors between January 2014 and December 2024. Exposures: Antibiotic exposure windows were categorized as pre-only (-60 to -1 days), post-only (+1 to +60 days), both windows, or none. Main Outcomes and Measures: The primary outcome was overall survival (OS) calculated from the first ICI dose. Multivariable Cox proportional hazards models adjusted for demographics, tumor type, line of therapy, and baseline health indicators (albumin, NLR, and recent hospitalization). Results: Among 21,108 patients, 17.3% had pre-only exposure, 13.3% had post-only exposure, and 60.6% had no exposure. In the multivariable model, post-only exposure (HR, 1.27; 95% CI, 1.20-1.35) and combined pre- and post- exposure (HR, 1.31; 95% CI, 1.23-1.40) were significantly associated with higher mortality. Pre-only exposure was not significantly associated with OS (HR, 1.04; 95% CI, 0.99-1.10). Subgroup analyses by tumor type showed consistent trends across major malignancies, including head and neck (Post HR, 1.46) and renal cell carcinoma (Post HR, 1.26). Conclusions and Relevance: In contrast to some smaller studies, this large-scale analysis indicates that antibiotic exposure after ICI initiation carries a greater risk than exposure prior to treatment. These findings highlight the need for rigorous antibiotic stewardship strategies specifically during the early phases of immunotherapy treatment.
Kramer, B.; Kushner, S. A.; Rzhetsky, A.
Show abstract
Maternal infection, immune disease, and delivery mode are plausible influences on early brain development. We analyzed 1,179,611 US Merative MarketScan mother-child pairs (2003-2024), including 259,339 non-twin siblings in 123,926 families. Population models screened 18 perinatal exposures against 13 childhood psychiatric/neurodevelopmental diagnosis-count outcomes; sibling fixed effects tested robustness to stable family-level confounding. Cesarean delivery was associated with higher composite neurodevelopmental diagnosis counts in pairs (23.4%) and siblings (25.0%) and with ADHD in siblings (38.8%; FDR q = 0.025). Autism was elevated in pairs (20.0%) but not supported within families (5.0%; p = 0.87). Claims-defined no-labor/no-repeat cesarean showed stronger lower-risk-birth associations for composite neurodevelopmental burden (48.0%), autism (44.9%), speech/language disorders (41.0%), and ADHD (24.1%). Maternal infection/immune-mediated disease, preterm birth, and advanced maternal age were additional population signals.
Lu, S.; Ruan, X.; Wang, L.; Wang, X.; Sameer, M.; Liu, H.
Show abstract
Although GLP1/GIP receptor agonists demonstrate unprecedented weight loss efficacy, their rapid clinical adoption has revealed significant real-world tolerability challenges. To evaluate their dynamic safety profiles, we developed a macro to micro pharmacovigilance framework by combining global FAERS reports with local UT Physician EHR. Macroscopically, we distilled 17 shared adverse events across the drug class from FAERS with disproportionality analysis. Microscopically, local EHR data (289,655 longitudinal treatment sessions across 71,316 patients) revealed 51.6% of GLP1 sessions terminated within 90 days. Furthermore, temporal stratified logistic regression demonstrated that initial exposure (0 to 30 days) correlated strongly with nausea and vomiting, which attenuated in extended sessions, whereas extended exposure (>2 years) uncovered late onset risks, notably incident hepatic steatosis. Ultimately, this time aware framework reveals that GLP1 safety profiles are profoundly duration dependent, providing critical insights into both acute intolerances and long-term medication safety.
Hu, S.; Cheng, H.; Gillenwater, L.; Manpearl, K.; Mandava, A.; Wang, Y.; Pividori, M.; Stranger, B.; Krishnan, A.; Greene, C.; Gao, Y.
Show abstract
Objective. Biomedical knowledge graphs (KGs) such as PrimeKG, Hetionet, UMLS, and PharmGKB are increasingly used as the substrate for downstream machine-learning, retrieval-augmented generation, drug-repurposing, and electronic health record (EHR) augmentation pipelines. The dominant assumption in published work is that integrating two or more such KGs is a tractable engineering step solved by identifier (ID) matching. This paper interrogates that assumption empirically. We quantify how much concept overlap survives realistic alignment, and we characterize the new failure modes introduced by the methods that practitioners reach for when ID matching is insufficient. Materials and Methods. We compared four widely used biomedical KGs (PrimeKG, Hetionet v1.0, the full UMLS Metathesaurus, and PharmGKB) across eleven node types using a tiered alignment pipeline: (1) direct ID matching for nodes sharing a primary vocabulary; (2) cross-ontology bridging using standard mappings (e.g., MONDO-DOID, HPO-UMLS, HPO-UMLS-MeSH for side effects, NCBI Gene-HGNC-UMLS, UBERON-FMA/SNOMEDCT_US/NCI/MeSH for anatomy); (3) ClinicalBERT cosine-similarity grouping at threshold >= 0.98 for over-segmented disease nodes, with a deterministic suffix-stripping canonicalizer; (4) exact name matching for ontology-poor types (anatomy, REACTOME pathways); and (5) embedding-based fuzzy matching with UMLS lookup (SapBERT and ClinicalBERT) for free-text microbiome concepts. We applied the pipeline to a 698-concept gut-microbiome benchmark spanning taxa, pathways, and disease labels, validated grouping decisions against the curated SSSOM mappings released by the MONDO project, and audited the ClinicalBERT consolidation against five clinical-genetics case studies drawn from the literature. Results. Per-type pairwise coverage was strikingly asymmetric. Genes/proteins and the three Gene Ontology categories aligned cleanly across PrimeKG and Hetionet (mutual coverage 94-99%), but disease overlap was sparse: only 0.7% of PrimeKG individual disease nodes mapped to Hetionet, rising to 2.0% after MONDO grouping (versus 78.7% and 18.4% from the Hetionet side). PrimeKG-to-UMLS coverage spanned 100% (effect/phenotype via HPO) down to 20.8% (REACTOME pathways), with drugs at 73.7% and anatomy at 58.8%. PrimeKG-to-PharmGKB drug coverage required up to two bridging hops (DrugBank -> UMLS -> RxNorm/ATC/MeSH). Bigger was not uniformly more complete: on a 698-concept microbiome drug benchmark, Hetionet missed 0 concepts while PrimeKG missed 16. ClinicalBERT-based grouping consolidated 22,205 raw MONDO disease nodes into 17,080 groups but introduced three reproducible failure modes documented in case studies: (i) peer over-merging: for example, all 22 osteogenesis imperfecta subtypes collapsed into a single node despite distinct severity classes; (ii) parent-child collapse: e.g. acute myeloid leukemia merged with myeloid leukemia, erasing the acute/chronic distinction that drives clinical management; and (iii) lexical false positives: neurofibromatosis and schwannomatosis grouped together despite cellular-pathology differences. Discussion. Identifier matching alone is a weak baseline for biomedical KG integration. Cross-ontology bridges and embedding-based consolidation expand coverage but do so at the cost of clinically meaningful resolution, and the resulting failures are systematic rather than random. Reporting only aggregate coverage statistics obscures these losses, which propagate silently into downstream tasks. Conclusion. We provide reusable per-type coverage tables, a taxonomy of three integration failure modes, and concrete recommendations for downstream studies that depend on a unified biomedical KG. We argue that future KG integration work should report per-type coverage and per-cluster confidence rather than aggregate match rates.
Rich, C. C. D.; Bang, E. J.; Bair, A. B.; Richardson, B. E.; Millington, J. L.; Bates, B. A.; Davis, M. F.; Bailey, M. H.
Show abstract
Background: The All of Us Research Program represents a rich resource for cancer epidemiology research, with over 400,000 participants with whole genome sequences linked to electronic health records (EHR). Large cancer datasets often focus exclusively on cases without controls and neglect pre-diagnosis healthcare occurrences. Here, we perform a phenome-wide association study (PheWAS) of EHR data at least 1 year pre-diagnosis between cancer cases and matched controls, revealing co-occurring and mutually exclusive phenotypes. Methods: We identified 55,000+ cancer cases across 21 cancer types in All of Us version 8. To eliminate age-related confounding, we implemented a two-stage matching and censoring strategy: loose matching on demographics to establish index dates and cohort comparability, followed by right-censoring of EHR data (excluding 1 year pre-diagnosis/index), then 1:2 matching to address residual demographic imbalance. We tested associations between 23,193 cancer cases, 46,386 matched controls and approximately 1,600 clinical phenotypes using logistic regression adjusted for sex at birth, self-reported race, age at diagnosis/index date, and two censored EHR metrics: observation window and unique condition count, with Bonferroni correction for multiple testing. Results: Our analysis identified 232 significantly associated phenotypes, confirming established cancer risk factors including elevated prostate specific antigen (OR = 2.92, 95% CI: 2.65-3.23; p-value=1.8x10-101) and multinodular goiter (OR = 1.73, 95% CI: 1.56-1.91; p-value=6.7x10-27). Further investigation into the relationship between several phenotypes with seeming inverse effects is warranted. Conclusions: This PheWAS of EHR data at least 1 year pre-diagnosis leveraged the diversity of All of Us to examine how clinical phenotypes prior to cancer diagnosis vary across cancer types and racial groups. Our findings validate All of Us as a robust platform for cancer epidemiology research, confirming established risk factors at scale across diverse populations. This work provides methodological insights for EHR-based susceptibility analyses and demonstrates the value of agnostic phenome-wide approaches for generating hypotheses in precision medicine.